49 research outputs found

    A Stochastic Tensor Method for Non-convex Optimization

    Full text link
    We present a stochastic optimization method that uses a fourth-order regularized model to find local minima of smooth and potentially non-convex objective functions with a finite-sum structure. This algorithm uses sub-sampled derivatives instead of exact quantities. The proposed approach is shown to find an (ϵ1,ϵ2,ϵ3)(\epsilon_1,\epsilon_2,\epsilon_3)-third-order critical point in at most \bigO\left(\max\left(\epsilon_1^{-4/3}, \epsilon_2^{-2}, \epsilon_3^{-4}\right)\right) iterations, thereby matching the rate of deterministic approaches. In order to prove this result, we derive a novel tensor concentration inequality for sums of tensors of any order that makes explicit use of the finite-sum structure of the objective function

    Stabilizing Training of Generative Adversarial Networks through Regularization

    Full text link
    Deep generative models based on Generative Adversarial Networks (GANs) have demonstrated impressive sample quality but in order to work they require a careful choice of architecture, parameter initialization, and selection of hyper-parameters. This fragility is in part due to a dimensional mismatch or non-overlapping support between the model distribution and the data distribution, causing their density ratio and the associated f-divergence to be undefined. We overcome this fundamental limitation and propose a new regularization approach with low computational cost that yields a stable GAN training procedure. We demonstrate the effectiveness of this regularizer across several architectures trained on common benchmark image generation tasks. Our regularization turns GAN models into reliable building blocks for deep learning

    Variance Reduced Stochastic Gradient Descent with Neighbors

    Full text link
    Stochastic Gradient Descent (SGD) is a workhorse in machine learning, yet its slow convergence can be a computational bottleneck. Variance reduction techniques such as SAG, SVRG and SAGA have been proposed to overcome this weakness, achieving linear convergence. However, these methods are either based on computations of full gradients at pivot points, or on keeping per data point corrections in memory. Therefore speed-ups relative to SGD may need a minimal number of epochs in order to materialize. This paper investigates algorithms that can exploit neighborhood structure in the training data to share and re-use information about past stochastic gradients across data points, which offers advantages in the transient optimization phase. As a side-product we provide a unified convergence analysis for a family of variance reduction algorithms, which we call memorization algorithms. We provide experimental results supporting our theory.Comment: Appears in: Advances in Neural Information Processing Systems 28 (NIPS 2015). 13 page
    corecore